Investigation of Distributed Search Engine Based on Hadoop

نویسنده

  • Ning Chen
چکیده

This paper begins with a review on the research status of search engine, followed by discussion on goals of search engine, and then the principle of distributed computing is explained. Consequently the MapReduce distributed computing model and the Hadoop distributed file system (HDFS) are analyzed in detail. Finally the distributed search engine architecture is presented. On the basis of the architecture, future challenges and opportunities of the distributed search engine are highlighted.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The efficient implementation of distributed indexing with Hadoop for digital investigations on Big Data

Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges are mostly connected to the various methods for data processing. Considering that the most important factors are time and cost in determining success or failure of digital investigation, the development of a valid indexing method for efficient search should come first to more quickly and accurat...

متن کامل

The Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine

Nowadays, the size of the Internet is experiencing rapid growth. As of December 2014, the number of global Internet websites has more than 1 billion and all kinds of information resources are integrated together on the Internet , however,the search engine is to be a necessary tool for all users to retrieve useful information from vast amounts of web data. Generally speaking, a complete search e...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Improving search results in life science by recommendations based on semantic information

The management and handling of big data is a major challenge in the area of life science. Beside the data storage, information retrieval methods have to be adapted to huge data amounts as well. Therefore we present an approach to improve search results in life science by recommendations based on semantic information. In detail we determine relationships between documents by searching for shared...

متن کامل

A New Model of Search Engine based on Cloud Computing

With the rapid increase of websites and internet users, the traditional search engine will face great challenge in the real-time search, response speed and the storage of mass pages. However, the search engine deployed in the cloud can solve these shortcomings due to cloud computing with two major advantages in mass data processing and mass data storage. By analyzing the open-source cloud compu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014